Overview

Dataset statistics

Number of variables8
Number of observations1552
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory91.1 KiB
Average record size in memory60.1 B

Variable types

Numeric8

Alerts

profit is highly correlated with recencydays and 1 other fieldsHigh correlation
recencydays is highly correlated with profitHigh correlation
qtd_items is highly correlated with profit and 1 other fieldsHigh correlation
avg_basket_size is highly correlated with qtd_itemsHigh correlation
profit is highly correlated with qtd_itemsHigh correlation
qtd_items is highly correlated with profit and 1 other fieldsHigh correlation
avg_ticket is highly correlated with qtd_itemsHigh correlation
profit is highly correlated with qtd_itemsHigh correlation
qtd_items is highly correlated with profitHigh correlation
df_index is highly correlated with recencydaysHigh correlation
profit is highly correlated with qtd_items and 1 other fieldsHigh correlation
recencydays is highly correlated with df_indexHigh correlation
qtd_items is highly correlated with profit and 1 other fieldsHigh correlation
avg_ticket is highly correlated with profit and 1 other fieldsHigh correlation
avg_ticket is highly skewed (γ1 = 27.98323513) Skewed
df_index has unique values Unique
customerid has unique values Unique
recencydays has 25 (1.6%) zeros Zeros

Reproduction

Analysis started2022-09-23 09:03:35.618123
Analysis finished2022-09-23 09:04:10.743121
Duration35.12 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct1552
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2664.299613
Minimum0
Maximum7757
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:11.378158image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile176.55
Q1920.75
median2175.5
Q34135.75
95-th percentile6536.05
Maximum7757
Range7757
Interquartile range (IQR)3215

Descriptive statistics

Standard deviation2057.528605
Coefficient of variation (CV)0.7722587184
Kurtosis-0.6910188006
Mean2664.299613
Median Absolute Deviation (MAD)1440.5
Skewness0.6626497421
Sum4134993
Variance4233423.96
MonotonicityStrictly increasing
2022-09-23T06:04:11.621123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
31901
 
0.1%
33851
 
0.1%
33841
 
0.1%
33781
 
0.1%
33681
 
0.1%
33621
 
0.1%
33591
 
0.1%
33551
 
0.1%
33281
 
0.1%
Other values (1542)1542
99.4%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
121
0.1%
141
0.1%
ValueCountFrequency (%)
77571
0.1%
76781
0.1%
76201
0.1%
75971
0.1%
75531
0.1%
75101
0.1%
75041
0.1%
75011
0.1%
74991
0.1%
74721
0.1%

customerid
Real number (ℝ≥0)

UNIQUE

Distinct1552
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15179.49742
Minimum12346
Maximum18282
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.2 KiB
2022-09-23T06:04:12.161130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12584.55
Q113701.5
median15123.5
Q316662.25
95-th percentile17976.8
Maximum18282
Range5936
Interquartile range (IQR)2960.75

Descriptive statistics

Standard deviation1723.792714
Coefficient of variation (CV)0.113560592
Kurtosis-1.174294211
Mean15179.49742
Median Absolute Deviation (MAD)1493
Skewness0.09372121084
Sum23558580
Variance2971461.322
MonotonicityNot monotonic
2022-09-23T06:04:12.508126image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
0.1%
129491
 
0.1%
139951
 
0.1%
146191
 
0.1%
130051
 
0.1%
157541
 
0.1%
139501
 
0.1%
149121
 
0.1%
182721
 
0.1%
137711
 
0.1%
Other values (1542)1542
99.4%
ValueCountFrequency (%)
123461
0.1%
123521
0.1%
123591
0.1%
123621
0.1%
123651
0.1%
123751
0.1%
123791
0.1%
123801
0.1%
123811
0.1%
123831
0.1%
ValueCountFrequency (%)
182821
0.1%
182771
0.1%
182761
0.1%
182741
0.1%
182721
0.1%
182701
0.1%
182691
0.1%
182681
0.1%
182631
0.1%
182601
0.1%

profit
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1550
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4647.002965
Minimum12.4
Maximum336942.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:12.796121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12.4
5-th percentile279.066
Q1750.48
median1656.615
Q33550.115
95-th percentile11690.5345
Maximum336942.1
Range336929.7
Interquartile range (IQR)2799.635

Descriptive statistics

Standard deviation17274.9834
Coefficient of variation (CV)3.717446174
Kurtosis184.1829746
Mean4647.002965
Median Absolute Deviation (MAD)1078.165
Skewness12.24371597
Sum7212148.602
Variance298425051.3
MonotonicityNot monotonic
2022-09-23T06:04:13.117122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
398.32
 
0.1%
2430.042
 
0.1%
2887.291
 
0.1%
461.471
 
0.1%
761.561
 
0.1%
894.181
 
0.1%
3092.381
 
0.1%
2637.21
 
0.1%
1525.311
 
0.1%
1545.241
 
0.1%
Other values (1540)1540
99.2%
ValueCountFrequency (%)
12.41
0.1%
21.951
0.1%
26.61
0.1%
39.81
0.1%
511
0.1%
63.451
0.1%
64.251
0.1%
70.231
0.1%
78.151
0.1%
93.431
0.1%
ValueCountFrequency (%)
336942.11
0.1%
280923.021
0.1%
262876.111
0.1%
201619.411
0.1%
155077.51
0.1%
154367.21
0.1%
126103.611
0.1%
121375.121
0.1%
111057.071
0.1%
93999.381
0.1%

recencydays
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct244
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.96778351
Minimum0
Maximum373
Zeros25
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:13.481125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q110
median28
Q378
95-th percentile274
Maximum373
Range373
Interquartile range (IQR)68

Descriptive statistics

Standard deviation84.65967212
Coefficient of variation (CV)1.303102362
Kurtosis2.673103827
Mean64.96778351
Median Absolute Deviation (MAD)24
Skewness1.843840174
Sum100830
Variance7167.260083
MonotonicityNot monotonic
2022-09-23T06:04:13.784131image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
163
 
4.1%
259
 
3.8%
850
 
3.2%
349
 
3.2%
447
 
3.0%
2236
 
2.3%
1635
 
2.3%
1735
 
2.3%
1034
 
2.2%
734
 
2.2%
Other values (234)1110
71.5%
ValueCountFrequency (%)
025
 
1.6%
163
4.1%
259
3.8%
349
3.2%
447
3.0%
521
 
1.4%
734
2.2%
850
3.2%
930
1.9%
1034
2.2%
ValueCountFrequency (%)
3732
0.1%
3723
0.2%
3711
 
0.1%
3691
 
0.1%
3681
 
0.1%
3663
0.2%
3653
0.2%
3641
 
0.1%
3601
 
0.1%
3591
 
0.1%

qtd_items
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct619
Distinct (%)39.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean565.7048969
Minimum1
Maximum80996
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:14.087125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile38
Q1100
median183
Q3346.25
95-th percentile1298.7
Maximum80996
Range80995
Interquartile range (IQR)246.25

Descriptive statistics

Standard deviation3206.037822
Coefficient of variation (CV)5.667332631
Kurtosis448.8159632
Mean565.7048969
Median Absolute Deviation (MAD)103.5
Skewness19.64767536
Sum877974
Variance10278678.51
MonotonicityNot monotonic
2022-09-23T06:04:14.377126image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5215
 
1.0%
7013
 
0.8%
10612
 
0.8%
6012
 
0.8%
11810
 
0.6%
8710
 
0.6%
11710
 
0.6%
1899
 
0.6%
1209
 
0.6%
669
 
0.6%
Other values (609)1443
93.0%
ValueCountFrequency (%)
11
 
0.1%
21
 
0.1%
31
 
0.1%
41
 
0.1%
61
 
0.1%
92
 
0.1%
111
 
0.1%
125
0.3%
151
 
0.1%
163
0.2%
ValueCountFrequency (%)
809961
0.1%
742151
0.1%
386391
0.1%
173761
0.1%
171501
0.1%
162881
0.1%
158531
0.1%
133691
0.1%
128721
0.1%
108281
0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1550
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean130.6282729
Minimum2.241
Maximum77183.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:14.696144image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2.241
5-th percentile6.159015874
Q115.48176257
median19.12091844
Q326.80088617
95-th percentile95.14292179
Maximum77183.6
Range77181.359
Interquartile range (IQR)11.31912361

Descriptive statistics

Standard deviation2447.777236
Coefficient of variation (CV)18.73849498
Kurtosis810.7275853
Mean130.6282729
Median Absolute Deviation (MAD)4.969503921
Skewness27.98323513
Sum202735.0796
Variance5991613.398
MonotonicityNot monotonic
2022-09-23T06:04:15.030125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
16.833333332
 
0.1%
25.52
 
0.1%
15.2891
 
0.1%
19.757692311
 
0.1%
16.37751
 
0.1%
21.287428571
 
0.1%
32.142592591
 
0.1%
18.545662651
 
0.1%
22.138235291
 
0.1%
73.79051
 
0.1%
Other values (1540)1540
99.2%
ValueCountFrequency (%)
2.2411
0.1%
2.2643751
0.1%
2.8176811591
0.1%
3.11
0.1%
3.1408024691
0.1%
3.1571134021
0.1%
3.2693333331
0.1%
3.451
0.1%
3.4872941181
0.1%
3.7345672191
0.1%
ValueCountFrequency (%)
77183.61
0.1%
56157.51
0.1%
13305.51
0.1%
4453.431
0.1%
2027.861
0.1%
952.98751
0.1%
931.51
0.1%
835.8641
0.1%
643.85857141
0.1%
602.45313231
0.1%

frequency
Real number (ℝ≥0)

Distinct873
Distinct (%)56.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2001652599
Minimum0.005479452055
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:15.329136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.005479452055
5-th percentile0.01078465562
Q10.01953125
median0.03109215265
Q30.06598173516
95-th percentile1
Maximum17
Range16.99452055
Interquartile range (IQR)0.04645048516

Descriptive statistics

Standard deviation0.5742324675
Coefficient of variation (CV)2.868791856
Kurtosis474.116048
Mean0.2001652599
Median Absolute Deviation (MAD)0.01509215265
Skewness17.00132451
Sum310.6564834
Variance0.3297429267
MonotonicityNot monotonic
2022-09-23T06:04:15.537122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1216
 
13.9%
0.029411764719
 
0.6%
0.037037037048
 
0.5%
0.083333333338
 
0.5%
0.019230769237
 
0.5%
0.027
 
0.5%
0.021739130437
 
0.5%
27
 
0.5%
0.028571428576
 
0.4%
0.024096385546
 
0.4%
Other values (863)1271
81.9%
ValueCountFrequency (%)
0.0054794520551
0.1%
0.0056818181821
0.1%
0.0057142857141
0.1%
0.005830903791
0.1%
0.0058997050151
0.1%
0.0059523809522
0.1%
0.0060060060061
0.1%
0.0064516129031
0.1%
0.006514657981
0.1%
0.0066006600661
0.1%
ValueCountFrequency (%)
171
 
0.1%
41
 
0.1%
32
 
0.1%
27
 
0.5%
1.1428571431
 
0.1%
1216
13.9%
0.751
 
0.1%
0.66666666671
 
0.1%
0.5508021391
 
0.1%
0.53351206431
 
0.1%

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1283
Distinct (%)82.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.008494577938
Minimum1.347436502 × 10-5
Maximum0.6666666667
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.2 KiB
2022-09-23T06:04:15.828122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1.347436502 × 10-5
5-th percentile0.001460699064
Q10.003302374003
median0.005154639175
Q30.008443806649
95-th percentile0.01967629123
Maximum0.6666666667
Range0.6666531923
Interquartile range (IQR)0.005141432646

Descriptive statistics

Standard deviation0.0267436409
Coefficient of variation (CV)3.148318974
Kurtosis388.5742713
Mean0.008494577938
Median Absolute Deviation (MAD)0.00225742778
Skewness18.49588505
Sum13.18358496
Variance0.0007152223284
MonotonicityNot monotonic
2022-09-23T06:04:16.105127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0087719298256
 
0.4%
0.0052631578955
 
0.3%
0.011627906985
 
0.3%
0.0034722222225
 
0.3%
0.012195121955
 
0.3%
0.0048076923084
 
0.3%
0.0048543689324
 
0.3%
0.0052910052914
 
0.3%
0.0072463768124
 
0.3%
0.0073529411764
 
0.3%
Other values (1273)1506
97.0%
ValueCountFrequency (%)
1.347436502 × 10-51
0.1%
2.469227255 × 10-51
0.1%
0.00016640781011
0.1%
0.0002703746621
0.1%
0.00037470061931
0.1%
0.00045804323931
0.1%
0.00046289152911
0.1%
0.00046696240951
0.1%
0.00048025530991
0.1%
0.00051190171491
0.1%
ValueCountFrequency (%)
0.66666666671
0.1%
0.52
0.1%
0.251
0.1%
0.18751
0.1%
0.16279069771
0.1%
0.083333333332
0.1%
0.076923076921
0.1%
0.055555555561
0.1%
0.052631578951
0.1%
0.045296167251
0.1%

Interactions

2022-09-23T06:04:08.177141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:55.151136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:57.196130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:59.117137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:00.606121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:02.370120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:04.067140image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:06.172121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:08.402122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:55.399123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:57.535126image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:59.304165image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:00.799135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:02.598136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:04.348141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:06.420123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:08.630138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:55.629123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:57.817124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:59.473120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:00.982132image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:02.768123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:04.580122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:06.669124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:08.831122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:55.916124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:58.066124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:59.666134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:01.171136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:02.936136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:04.896127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:06.984123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:09.048137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:56.118122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:58.307123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:59.836136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:01.353133image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:03.108137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:05.152122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:07.265138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:09.284123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:56.387124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:58.544122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:00.038124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:01.548163image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:03.304136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:05.421136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:07.517136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:09.526126image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:56.644125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:58.747135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:00.230121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:01.753137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:03.506121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:05.665123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:07.755124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:09.732123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:56.933123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:03:58.943137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:00.412166image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:01.958130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:03.802122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:05.914139image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-23T06:04:07.958140image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-23T06:04:16.317120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-23T06:04:16.644136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-23T06:04:16.922136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-23T06:04:17.187139image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-23T06:04:10.285135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-23T06:04:10.605123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomeridprofitrecencydaysqtd_itemsavg_ticketfrequencyavg_basket_size
00178505493.79372.035.018.15222217.0000000.019619
11130473395.9831.0132.018.8229070.0283020.007189
22125837375.422.01569.029.4792710.0403230.002964
34151001116.90333.048.0292.0000000.0731710.037500
45152914740.0925.0508.045.3233010.0401150.007133
56146886154.367.0579.017.2197860.0572210.005800
67178096196.2016.0961.088.7198360.0335200.005834
781531162116.460.02167.025.5434640.2433160.002383
81216029111057.0738.010828.0334.8133880.1845240.001567
914124316558.5135.01130.027.4891950.0442480.005141

Last rows

df_indexcustomeridprofitrecencydaysqtd_itemsavg_ticketfrequencyavg_basket_size
1542747215877545.781.0177.03.8238760.1176470.005391
1543749912586213.9417.056.017.9036361.0000000.012658
1544750116376996.508.0276.07.8960800.2000000.002882
1545750412452432.5716.095.019.5713641.0000000.010811
154675101808493.4316.0312.090.4800001.0000000.003205
15477553177271077.9515.0111.016.0643941.0000000.001550
1548759712479577.1011.087.017.0064521.0000000.002597
1549762014126768.637.0361.047.0753330.7500000.005906
1550767812558539.927.0102.024.5418181.0000000.005102
1551775714087207.172.0113.02.8176811.0000000.003984